Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
J Am Med Inform Assoc ; 31(3): 574-582, 2024 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-38109888

RESUMEN

OBJECTIVES: Automated phenotyping algorithms can reduce development time and operator dependence compared to manually developed algorithms. One such approach, PheNorm, has performed well for identifying chronic health conditions, but its performance for acute conditions is largely unknown. Herein, we implement and evaluate PheNorm applied to symptomatic COVID-19 disease to investigate its potential feasibility for rapid phenotyping of acute health conditions. MATERIALS AND METHODS: PheNorm is a general-purpose automated approach to creating computable phenotype algorithms based on natural language processing, machine learning, and (low cost) silver-standard training labels. We applied PheNorm to cohorts of potential COVID-19 patients from 2 institutions and used gold-standard manual chart review data to investigate the impact on performance of alternative feature engineering options and implementing externally trained models without local retraining. RESULTS: Models at each institution achieved AUC, sensitivity, and positive predictive value of 0.853, 0.879, 0.851 and 0.804, 0.976, and 0.885, respectively, at quantiles of model-predicted risk that maximize F1. We report performance metrics for all combinations of silver labels, feature engineering options, and models trained internally versus externally. DISCUSSION: Phenotyping algorithms developed using PheNorm performed well at both institutions. Performance varied with different silver-standard labels and feature engineering options. Models developed locally at one site also worked well when implemented externally at the other site. CONCLUSION: PheNorm models successfully identified an acute health condition, symptomatic COVID-19. The simplicity of the PheNorm approach allows it to be applied at multiple study sites with substantially reduced overhead compared to traditional approaches.


Asunto(s)
Algoritmos , COVID-19 , Humanos , Registros Electrónicos de Salud , Aprendizaje Automático , Procesamiento de Lenguaje Natural
2.
Child Abuse Negl ; 138: 106090, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36758373

RESUMEN

BACKGROUND: Rates of child maltreatment (CM) obtained from electronic health records are much lower than national child welfare prevalence rates indicate. There is a need to understand how CM is documented to improve reporting and surveillance. OBJECTIVES: To examine whether using natural language processing (NLP) in outpatient chart notes can identify cases of CM not documented by ICD diagnosis code, the overlap between the coding of child maltreatment by ICD and NLP, and any differences by age, gender, or race/ethnicity. METHODS: Outpatient chart notes of children age 0-18 years old within Kaiser Permanente Washington (KPWA) 2018-2020 were used to examine a selected set of maltreatment-related terms categorized into concept unique identifiers (CUI). Manual review of text snippets for each CUI was completed to flag for validated cases and retrain the NLP algorithm. RESULTS: The NLP results indicated a crude rate of 1.55 % to 2.36 % (2018-2020) of notes with reference to CM. The rate of CM identified by ICD code was 3.32 per 1000 children, whereas the rate identified by NLP was 37.38 per 1000 children. The groups that increased the most in identification of maltreatment from ICD to NLP were adolescents (13-18 yrs. old), females, Native American children, and those on Medicaid. Of note, all subgroups had substantially higher rates of maltreatment when using NLP. CONCLUSIONS: Use of NLP substantially increased the estimated number of children who have been impacted by CM. Accurately capturing this population will improve identification of vulnerable youth at high risk for mental health symptoms.


Asunto(s)
Maltrato a los Niños , Procesamiento de Lenguaje Natural , Femenino , Adolescente , Niño , Humanos , Recién Nacido , Lactante , Preescolar , Clasificación Internacional de Enfermedades , Washingtón/epidemiología , Registros Electrónicos de Salud
3.
Sci Rep ; 13(1): 1971, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36737471

RESUMEN

The electronic Medical Records and Genomics (eMERGE) Network assessed the feasibility of deploying portable phenotype rule-based algorithms with natural language processing (NLP) components added to improve performance of existing algorithms using electronic health records (EHRs). Based on scientific merit and predicted difficulty, eMERGE selected six existing phenotypes to enhance with NLP. We assessed performance, portability, and ease of use. We summarized lessons learned by: (1) challenges; (2) best practices to address challenges based on existing evidence and/or eMERGE experience; and (3) opportunities for future research. Adding NLP resulted in improved, or the same, precision and/or recall for all but one algorithm. Portability, phenotyping workflow/process, and technology were major themes. With NLP, development and validation took longer. Besides portability of NLP technology and algorithm replicability, factors to ensure success include privacy protection, technical infrastructure setup, intellectual property agreement, and efficient communication. Workflow improvements can improve communication and reduce implementation time. NLP performance varied mainly due to clinical document heterogeneity; therefore, we suggest using semi-structured notes, comprehensive documentation, and customization options. NLP portability is possible with improved phenotype algorithm performance, but careful planning and architecture of the algorithms is essential to support local customizations.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Genómica , Algoritmos , Fenotipo
4.
Am J Epidemiol ; 192(2): 283-295, 2023 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-36331289

RESUMEN

We sought to determine whether machine learning and natural language processing (NLP) applied to electronic medical records could improve performance of automated health-care claims-based algorithms to identify anaphylaxis events using data on 516 patients with outpatient, emergency department, or inpatient anaphylaxis diagnosis codes during 2015-2019 in 2 integrated health-care institutions in the Northwest United States. We used one site's manually reviewed gold-standard outcomes data for model development and the other's for external validation based on cross-validated area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), and sensitivity. In the development site 154 (64%) of 239 potential events met adjudication criteria for anaphylaxis compared with 180 (65%) of 277 in the validation site. Logistic regression models using only structured claims data achieved a cross-validated AUC of 0.58 (95% CI: 0.54, 0.63). Machine learning improved cross-validated AUC to 0.62 (0.58, 0.66); incorporating NLP-derived covariates further increased cross-validated AUCs to 0.70 (0.66, 0.75) in development and 0.67 (0.63, 0.71) in external validation data. A classification threshold with cross-validated PPV of 79% and cross-validated sensitivity of 66% in development data had cross-validated PPV of 78% and cross-validated sensitivity of 56% in external data. Machine learning and NLP-derived data improved identification of validated anaphylaxis events.


Asunto(s)
Anafilaxia , Procesamiento de Lenguaje Natural , Humanos , Anafilaxia/diagnóstico , Anafilaxia/epidemiología , Aprendizaje Automático , Algoritmos , Servicio de Urgencia en Hospital , Registros Electrónicos de Salud
5.
AMIA Annu Symp Proc ; 2023: 608-617, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38222338

RESUMEN

Physical activity is important for prostate cancer survivors. Yet survivors face significant barriers to traditional structured exercise programs, limiting engagement and impact. Digital programs that incorporate fitness trackers and peer support via social media have potential to improve the reach and impact of traditional support. Using a digital walking program with prostate cancer survivors, we employed mixed methods to assess program outcomes, engagement, perceived utility, and social influence. After 6 weeks of program use, survivors and loved ones (n=18) significantly increased their average daily step count. Although engagement and perceived utility of using a fitness tracker and interacting with walking buddies was high, social media engagement and utility were limited. Group strategies associated with social influence were driven more by group attraction to the collective task of walking than by interpersonal bonds. Findings demonstrate the feasibility of a digital walking program to improve physical activity and extend the reach of traditional support.


Asunto(s)
Supervivientes de Cáncer , Neoplasias de la Próstata , Masculino , Humanos , Próstata , Ejercicio Físico , Neoplasias de la Próstata/terapia , Caminata , Sobrevivientes
6.
BMC Med Inform Decis Mak ; 22(1): 129, 2022 05 12.
Artículo en Inglés | MEDLINE | ID: mdl-35549702

RESUMEN

BACKGROUND: Patients and their loved ones often report symptoms or complaints of cognitive decline that clinicians note in free clinical text, but no structured screening or diagnostic data are recorded. These symptoms/complaints may be signals that predict who will go on to be diagnosed with mild cognitive impairment (MCI) and ultimately develop Alzheimer's Disease or related dementias. Our objective was to develop a natural language processing system and prediction model for identification of MCI from clinical text in the absence of screening or other structured diagnostic information. METHODS: There were two populations of patients: 1794 participants in the Adult Changes in Thought (ACT) study and 2391 patients in the general population of Kaiser Permanente Washington. All individuals had standardized cognitive assessment scores. We excluded patients with a diagnosis of Alzheimer's Disease, Dementia or use of donepezil. We manually annotated 10,391 clinic notes to train the NLP model. Standard Python code was used to extract phrases from notes and map each phrase to a cognitive functioning concept. Concepts derived from the NLP system were used to predict future MCI. The prediction model was trained on the ACT cohort and 60% of the general population cohort with 40% withheld for validation. We used a least absolute shrinkage and selection operator logistic regression approach (LASSO) to fit a prediction model with MCI as the prediction target. Using the predicted case status from the LASSO model and known MCI from standardized scores, we constructed receiver operating curves to measure model performance. RESULTS: Chart abstraction identified 42 MCI concepts. Prediction model performance in the validation data set was modest with an area under the curve of 0.67. Setting the cutoff for correct classification at 0.60, the classifier yielded sensitivity of 1.7%, specificity of 99.7%, PPV of 70% and NPV of 70.5% in the validation cohort. DISCUSSION AND CONCLUSION: Although the sensitivity of the machine learning model was poor, negative predictive value was high, an important characteristic of models used for population-based screening. While an AUC of 0.67 is generally considered moderate performance, it is also comparable to several tests that are widely used in clinical practice.


Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Enfermedad de Alzheimer/diagnóstico , Disfunción Cognitiva/diagnóstico , Humanos , Aprendizaje Automático , Tamizaje Masivo , Procesamiento de Lenguaje Natural
7.
Subst Abus ; 43(1): 917-924, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35254218

RESUMEN

Background: Most states have legalized medical cannabis, yet little is known about how medical cannabis use is documented in patients' electronic health records (EHRs). We used natural language processing (NLP) to calculate the prevalence of clinician-documented medical cannabis use among adults in an integrated health system in Washington State where medical and recreational use are legal. Methods: We analyzed EHRs of patients ≥18 years old screened for past-year cannabis use (November 1, 2017-October 31, 2018), to identify clinician-documented medical cannabis use. We defined medical use as any documentation of cannabis that was recommended by a clinician or described by the clinician or patient as intended to manage health conditions or symptoms. We developed and applied an NLP system that included NLP-assisted manual review to identify such documentation in encounter notes. Results: Medical cannabis use was documented for 16,684 (5.6%) of 299,597 outpatient encounters with routine screening for cannabis use among 203,489 patients seeing 1,274 clinicians. The validated NLP system identified 54% of documentation and NLP-assisted manual review the remainder. Language documenting reasons for cannabis use included 125 terms indicating medical use, 28 terms indicating non-medical use and 41 ambiguous terms. Implicit documentation of medical use (e.g., "edible THC nightly for lumbar pain") was more common than explicit (e.g., "continues medical cannabis use"). Conclusions: Clinicians use diverse and often ambiguous language to document patients' reasons for cannabis use. Automating extraction of documentation about patients' cannabis use could facilitate clinical decision support and epidemiological investigation but will require large amounts of gold standard training data.


Asunto(s)
Marihuana Medicinal , Procesamiento de Lenguaje Natural , Adolescente , Adulto , Documentación , Humanos , Marihuana Medicinal/uso terapéutico , Medición de Resultados Informados por el Paciente , Atención Primaria de Salud
8.
JAMA Netw Open ; 4(5): e219375, 2021 05 03.
Artículo en Inglés | MEDLINE | ID: mdl-33956129

RESUMEN

Importance: Many people use cannabis for medical reasons despite limited evidence of therapeutic benefit and potential risks. Little is known about medical practitioners' documentation of medical cannabis use or clinical characteristics of patients with documented medical cannabis use. Objectives: To estimate the prevalence of past-year medical cannabis use documented in electronic health records (EHRs) and to describe patients with EHR-documented medical cannabis use, EHR-documented cannabis use without evidence of medical use (other cannabis use), and no EHR-documented cannabis use. Design, Setting, and Participants: This cross-sectional study assessed adult primary care patients who completed a cannabis screen during a visit between November 1, 2017, and October 31, 2018, at a large health system that conducts routine cannabis screening in a US state with legal medical and recreational cannabis use. Exposures: Three mutually exclusive categories of EHR-documented cannabis use (medical, other, and no use) based on practitioner documentation of medical cannabis use in the EHR and patient report of past-year cannabis use at screening. Main Outcomes and Measures: Health conditions for which cannabis use has potential benefits or risks were defined based on National Academies of Sciences, Engineering, and Medicine's review. The adjusted prevalence of conditions diagnosed in the prior year were estimated across 3 categories of EHR-documented cannabis use with logistic regression. Results: A total of 185 565 patients (mean [SD] age, 52.0 [18.1] years; 59% female, 73% White, 94% non-Hispanic, and 61% commercially insured) were screened for cannabis use in a primary care visit during the study period. Among these patients, 3551 (2%) had EHR-documented medical cannabis use, 36 599 (20%) had EHR-documented other cannabis use, and 145 415 (78%) had no documented cannabis use. Patients with medical cannabis use had a higher prevalence of health conditions for which cannabis has potential benefits (49.8%; 95% CI, 48.3%-51.3%) compared with patients with other cannabis use (39.9%; 95% CI, 39.4%-40.3%) or no cannabis use (40.0%; 95% CI, 39.8%-40.2%). In addition, patients with medical cannabis use had a higher prevalence of health conditions for which cannabis has potential risks (60.7%; 95% CI, 59.0%-62.3%) compared with patients with other cannabis use (50.5%; 95% CI, 50.0%-51.0%) or no cannabis use (42.7%; 95% CI, 42.4%-42.9%). Conclusions and Relevance: In this cross-sectional study, primary care patients with documented medical cannabis use had a high prevalence of health conditions for which cannabis use has potential benefits, yet a higher prevalence of conditions with potential risks from cannabis use. These findings suggest that practitioners should be prepared to discuss potential risks and benefits of cannabis use with patients.


Asunto(s)
Registros Electrónicos de Salud/estadística & datos numéricos , Marihuana Medicinal/uso terapéutico , Atención Primaria de Salud/estadística & datos numéricos , Adolescente , Adulto , Anciano , Estudios Transversales , Femenino , Humanos , Masculino , Persona de Mediana Edad , Medición de Riesgo , Resultado del Tratamiento , Washingtón/epidemiología , Adulto Joven
9.
Genet Epidemiol ; 45(1): 4-15, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-32964493

RESUMEN

Carotid artery atherosclerotic disease (CAAD) is a risk factor for stroke. We used a genome-wide association (GWAS) approach to discover genetic variants associated with CAAD in participants in the electronic Medical Records and Genomics (eMERGE) Network. We identified adult CAAD cases with unilateral or bilateral carotid artery stenosis and controls without evidence of stenosis from electronic health records at eight eMERGE sites. We performed GWAS with a model adjusting for age, sex, study site, and genetic principal components of ancestry. In eMERGE we found 1793 CAAD cases and 17,958 controls. Two loci reached genome-wide significance, on chr6 in LPA (rs10455872, odds ratio [OR] (95% confidence interval [CI]) = 1.50 (1.30-1.73), p = 2.1 × 10-8 ) and on chr7, an intergenic single nucleotide variant (SNV; rs6952610, OR (95% CI) = 1.25 (1.16-1.36), p = 4.3 × 10-8 ). The chr7 association remained significant in the presence of the LPA SNV as a covariate. The LPA SNV was also associated with coronary heart disease (CHD; 4199 cases and 11,679 controls) in this study (OR (95% CI) = 1.27 (1.13-1.43), p = 5 × 10-5 ) but the chr7 SNV was not (OR (95% CI) = 1.03 (0.97-1.09), p = .37). Both variants replicated in UK Biobank. Elevated lipoprotein(a) concentrations ([Lp(a)]) and LPA variants associated with elevated [Lp(a)] have previously been associated with CAAD and CHD, including rs10455872. With electronic health record phenotypes in eMERGE and UKB, we replicated a previously known association and identified a novel locus associated with CAAD.


Asunto(s)
Estenosis Carotídea , Estudio de Asociación del Genoma Completo , Registros Electrónicos de Salud , Predisposición Genética a la Enfermedad , Genómica , Humanos , Lipoproteína(a)/genética , Modelos Genéticos , Polimorfismo de Nucleótido Simple
10.
J Am Med Inform Assoc ; 27(9): 1374-1382, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32930712

RESUMEN

OBJECTIVE: Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this "residual PII problem." HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII. MATERIALS AND METHODS: Using 2000 representative clinical documents from 2 healthcare settings (4000 total), we used a novel method to generate 2 de-identified 100-document corpora (200 documents total) in which PII tagged by a typical automated machine-learned tagger was replaced by HIPS-resynthesized content. Four readers conducted aggressive reidentification attacks to isolate leaked PII: 2 readers from within the originating institution and 2 external readers. RESULTS: Overall, mean recall of leaked PII was 26.8% and mean precision was 37.2%. Mean recall was 9% (mean precision = 37%) for patient ages, 32% (mean precision = 26%) for dates, 25% (mean precision = 37%) for doctor names, 45% (mean precision = 55%) for organization names, and 23% (mean precision = 57%) for patient names. Recall was 32% (precision = 40%) for internal and 22% (precision =33%) for external readers. DISCUSSION AND CONCLUSIONS: Approximately 70% of leaked PII "hiding" in a corpus de-identified with HIPS resynthesis is resilient to detection by human readers in a realistic, aggressive reidentification attack scenario-more than double the rate reported in previous studies but less than the rate reported for an attack assisted by machine learning methods.


Asunto(s)
Confidencialidad , Anonimización de la Información , Registros Electrónicos de Salud , Seguridad Computacional , Humanos , Procesamiento de Lenguaje Natural
11.
J Drug Assess ; 9(1): 97-105, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32489718

RESUMEN

Objective: Opioid surveillance in response to the opioid epidemic will benefit from scalable, automated algorithms for identifying patients with clinically documented signs of problem prescription opioid use. Existing algorithms lack accuracy. We sought to develop a high-sensitivity, high-specificity classification algorithm based on widely available structured health data to identify patients receiving chronic extended-release/long-acting (ER/LA) therapy with evidence of problem use to support subsequent epidemiologic investigations. Methods: Outpatient medical records of a probability sample of 2,000 Kaiser Permanente Washington patients receiving ≥60 days' supply of ER/LA opioids in a 90-day period from 1 January 2006 to 30 June 2015 were manually reviewed to determine the presence of clinically documented signs of problem use and used as a reference standard for algorithm development. Using 1,400 patients as training data, we constructed candidate predictors from demographic, enrollment, encounter, diagnosis, procedure, and medication data extracted from medical claims records or the equivalent from electronic health record (EHR) systems, and we used adaptive least absolute shrinkage and selection operator (LASSO) regression to develop a model. We evaluated this model in a comparable 600-patient validation set. We compared this model to ICD-9 diagnostic codes for opioid abuse, dependence, and poisoning. This study was registered with ClinicalTrials.gov as study NCT02667262 on 28 January 2016. Results: We operationalized 1,126 potential predictors characterizing patient demographics, procedures, diagnoses, timing, dose, and location of medication dispensing. The final model incorporating 53 predictors had a sensitivity of 0.582 at positive predictive value (PPV) of 0.572. ICD-9 codes for opioid abuse, dependence, and poisoning had a sensitivity of 0.390 at PPV of 0.599 in the same cohort. Conclusions: Scalable methods using widely available structured EHR/claims data to accurately identify problem opioid use among patients receiving long-term ER/LA therapy were unsuccessful. This approach may be useful for identifying patients needing clinical evaluation.

12.
J Am Med Inform Assoc ; 26(12): 1536-1544, 2019 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-31390016

RESUMEN

OBJECTIVE: Clinical corpora can be deidentified using a combination of machine-learned automated taggers and hiding in plain sight (HIPS) resynthesis. The latter replaces detected personally identifiable information (PII) with random surrogates, allowing leaked PII to blend in or "hide in plain sight." We evaluated the extent to which a malicious attacker could expose leaked PII in such a corpus. MATERIALS AND METHODS: We modeled a scenario where an institution (the defender) externally shared an 800-note corpus of actual outpatient clinical encounter notes from a large, integrated health care delivery system in Washington State. These notes were deidentified by a machine-learned PII tagger and HIPS resynthesis. A malicious attacker obtained and performed a parrot attack intending to expose leaked PII in this corpus. Specifically, the attacker mimicked the defender's process by manually annotating all PII-like content in half of the released corpus, training a PII tagger on these data, and using the trained model to tag the remaining encounter notes. The attacker hypothesized that untagged identifiers would be leaked PII, discoverable by manual review. We evaluated the attacker's success using measures of leak-detection rate and accuracy. RESULTS: The attacker correctly hypothesized that 211 (68%) of 310 actual PII leaks in the corpus were leaks, and wrongly hypothesized that 191 resynthesized PII instances were also leaks. One-third of actual leaks remained undetected. DISCUSSION AND CONCLUSION: A malicious parrot attack to reveal leaked PII in clinical text deidentified by machine-learned HIPS resynthesis can attenuate but not eliminate the protective effect of HIPS deidentification.


Asunto(s)
Seguridad Computacional , Confidencialidad , Anonimización de la Información , Registros Electrónicos de Salud , Aprendizaje Automático , Información Personal , Instituciones de Atención Ambulatoria , Atención a la Salud , Humanos , Washingtón
13.
Methods Inf Med ; 55(4): 356-64, 2016 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-27405787

RESUMEN

BACKGROUND: Clinical text contains valuable information but must be de-identified before it can be used for secondary purposes. Accurate annotation of personally identifiable information (PII) is essential to the development of automated de-identification systems and to manual redaction of PII. Yet the accuracy of annotations may vary considerably across individual annotators and annotation is costly. As such, the marginal benefit of incorporating additional annotators has not been well characterized. OBJECTIVES: This study models the costs and benefits of incorporating increasing numbers of independent human annotators to identify the instances of PII in a corpus. We used a corpus with gold standard annotations to evaluate the performance of teams of annotators of increasing size. METHODS: Four annotators independently identified PII in a 100-document corpus consisting of randomly selected clinical notes from Family Practice clinics in a large integrated health care system. These annotations were pooled and validated to generate a gold standard corpus for evaluation. RESULTS: Recall rates for all PII types ranged from 0.90 to 0.98 for individual annotators to 0.998 to 1.0 for teams of three, when meas-ured against the gold standard. Median cost per PII instance discovered during corpus annotation ranged from $ 0.71 for an individual annotator to $ 377 for annotations discovered only by a fourth annotator. CONCLUSIONS: Incorporating a second annotator into a PII annotation process reduces unredacted PII and improves the quality of annotations to 0.99 recall, yielding clear benefit at reasonable cost; the cost advantages of annotation teams larger than two diminish rapidly.


Asunto(s)
Análisis Costo-Beneficio/economía , Minería de Datos/economía , Sistemas de Identificación de Pacientes/economía , Registros Electrónicos de Salud , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...